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ABSTRACT Levinthal’s paradox is that finding the native 
folded state of a protein by a random search among all possible 
configurations can take an enormously long time. Yet proteins 
can fold in seconds or less. Mathematical analysis of a simple 
model shows that a small and physically reasonable energy bias 
against locally unfavorable configurations, of the order of a few 
kT, can reduce Levinthal’s time to a biologically significant 
size. 


Lectures and articles dealing with protein folding dynamics 
often begin with a reference to the Levinthal ‘‘paradox’’ (1, 
2t). The main point of this paper is to show by mathematical 
analysis of a simple model that Levinthal’s paradox becomes 
irrelevant to protein folding when some of the interactions 
between amino acids are taken into account. 

How long does it take for a protein to fold up into its native 
structure? In a standard illustration of the Levinthal paradox, 
each bond connecting amino acids can have several (e.g., 
three) possible states, so that a protein of, say, 101 amino 
acids could exist in 3! = 5 x 10% configurations. Even if the 
protein is able to sample new configurations at the rate of 10” 
per second, or 3 x 10% per year, it will take 10’ years to try 
them all. Levinthal concluded that random searches are not 
an effective way of finding the correct state of a folded 
protein. Nevertheless, proteins do fold, and in a time scale of 
seconds or less. This is the paradox. 

A clue to the resolution of the paradox is suggested by 
Dawkins (3) in a discussion of evolution by the accumulation 
of small changes. He gave a more whimsical example of a 
similar paradox: how long will a random search take to 
produce Hamlet’s remark ‘‘Methinks it is like a weasel’’? 
This statement contains 28 characters, including 5 spaces; 
and there are 27 possible choices for each location, 26 letters 
and a space. A monkey typing randomly would probably 
require about 27% = 10“ key strokes. Dawkins observed that 
if the monkey cannot change those letters that are already 
correctly in place, Hamlet’s remark may be reached by a 
random search in only a few thousand key strokes. 

In both examples, folding proteins or writing Hamlet, 
biased searches are much more effective than completely 
random searches. Of course this is well known; in protein 
folding simulations, potential energy functions provide the 
necessary bias for Monte Carlo methods (4) and for molecular 
dynamics methods (5). However, these methods rely heavily 
on computation and are not amenable to easy mathematical 
analysis. The goal of this paper is to provide the mathematical 
analysis of Levinthal’s paradox for a highly simplified model 
of protein folding. 

A first-passage time calculation shows that for an unbiased 
random search, Levinthal’s protein folding estimate is es- 
sentially correct. But if a modest amount of bias is intro- 
duced, for example by imposing an energy cost of a few kT 
for locally incorrect bond configurations, the first-passage 
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time to the fully correct state can be very much shorter. In 
fact, this time can become biologically significant. 


Model and Results 


Since the goal is not to understand the folding of any 
particular protein, but only to present an elementary resolu- 
tion of Levinthal’s paradox, precise details of the protein 
structure will be ignored. Consequently, the model to be 
treated is not expected to be directly useful in the theory of 
protein folding. It allows for only one of the many kinds of 
energetic effects that are known to be involved in folding a 
real protein. 

The protein is a chain of N + 1 amino acids and N bonds. 
The connecting bond between two neighboring amino acids 
can be characterized as ‘‘correct”’ or ‘‘incorrect.’’ (Correct 
means native in biology and ‘‘Shakespearean’’ in writing 
Hamlet.) There may be several ways that this bond can be 
incorrect; these will all be lumped together. Correct bonds 
are labeled c, and incorrect bonds are labeled i. A typical 
configuration of the chain is cciiciccciic. The “‘perfect’’ or 
fully correct state is the one consisting of all c’s and no 1’s. 
The problem treated here is: starting with an arbitrary 
distribution of correct and incorrect bonds, and some rule for 
making changes, find how long it takes to get to the perfect 
chain for the first time. 

The rule for making changes is the main issue. These 
changes cannot be entirely random; they must be governed 
by physical chemical laws. The simplest nontrivial assump- 
tion one can make is that a correct bond can become incorrect 
(c — i) with the rate kọ and an incorrect bond can become 
correct (i —> c) with the rate kı and that these changes occur 
entirely independently. As a result, the number S of incorrect 
bonds in the protein configuration changes in time. The 
first-passage time to the perfect state is the elapsed time, 
starting from some arbitrary initial S, to arrive for the first 
time at S = 0. The mean first-passage time 7(S) is the average 
of this elapsed time over all ways of getting from S to S = 0. 

Then the mean first-passage time from a configuration with 
S incorrect bonds to the perfect configuration is approxi- 
mately 

T(S) = (1/Nkp)(1 + ko/kı)™. [1] 
(The exact result is given later in Eq. 16.) This is asymptot- 
ically correct for large N if kọ is not too small. The time 7 is 
essentially independent of the starting S; even if the starting 
configuration is close to perfect, there is a significant prob- 
ability that it will wander further away before reaching S = 
0. The mean first-passage time for a fully biased search, 
where the change c — i is not allowed so that kp = 0, is 
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S 1 
T(S) = (1/ky) È =. [2] 
J=1 J 


In this limit, 7 is independent of N and has a logarithmic 
dependence on S. This is the formula to use in connection 
with Dawkins’ ‘‘weasel.’’ It gives a value for 7 of the order 
of 105 generations (one generation is 28 attempts), which is 
what one sees in a computer simulation of a fully biased 
random search. The derivation of these formulae will be 
given later. 

Up to this point, the protein was characterized only by N 
and the two rate constants. However, it is useful to make a 
specific interpretation of the ratio ko/k,. The kinetic scheme 
for a single bond is 


d 
P [c] = —kolc] + kili], = [ce] + [i] = 1. [3] 


The ratio of the rate constants is an equilibrium constant, 
Lileg/ [Cleg = ko/k, = K. [4] 


Then [c].g = 1/(1 + K) and [i], = K/(1 + K). Although the 
separate rate constants may involve collision frequencies, 
Brownian motion over potential barriers, or other dynamical 
effects, K does not. It is strictly thermodynamic. The rate ko 
or kı only sets the overall time scale for 7(S). 

The equilibrium constant can be found from statistical 
mechanics. Suppose that there are v + 1 possible kinds of 
bond. The correct bond has degeneracy 1 and energy £., and 
the incorrect bonds have degeneracy v and energy £j = £. + 
U. Thus U is an energy penalty for making an incorrect bond. 
Then by working out the equilibrium statistical thermody- 
namics, one finds 


K = ko/k, = ve "íT, [5] 
Discussion 


When U = 0, or there is no penalty, the mean first-passage 
time becomes 


TL = (1/Nko)(v + 1)”, [6] 


where (v + 1)” is the number of possible configurations and 
Nko is the sampling rate. This is the formula that is usually 
used in discussions of Levinthal’s paradox. 

But if there is a penalty, so that kp/k, is small, 7 can become 
much smaller. This is shown dramatically in Fig. 1. The graph 
was drawn using the exact formula for 7(S) given in Eq. 16; 
the approximate formula in Eq. 1 gives slightly smaller values 
for r when U/kT is big. This graph is based on N = 100, v = 
2, and S = 66. The rate constants were arbitrarily chosen as 
kı = 10? s~? for i — c and kọ = 2 exp(—U/kT) x 10? s“! for 
c — i. This choice satisfies Eq. 5. As in Metropolis Monte 
Carlo simulations, kı is taken to be independent of temper- 
ature, so that the entire temperature dependence comes from 
the energy penalty in making an incorrect bond. The figure 
shows the mean first-passage time, in years, as a function of 
U/kT. According to Eq. 2, the first-passage time in the limit 
of infinite U/kT is about 1.5 x 10716 year or 5 x 107’ s. 

The figure shows that the first-passage time becomes 
biologically significant (of the order of 1 second) when U/kT 
is greater than about 2. One may argue that the chosen value 
of kı is only an uninformed guess, but one must remember 
that the graph covers a range of more than 40 orders of 
magnitude. If k, is changed by a few orders of magnitude, the 
vertical axis is shifted by that amount. Then the energy at 
which the resulting first-passage time is 1 second shifts to a 
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Fic. 1. Mean first-passage time, in years, as a function of the 
energy bias U/kT. 


bit more or a bit less than 2kT. Evidently, reasonable changes 
in k, do not affect the qualitative conclusion. Levinthal’s time 
is greatly reduced by a very modest and physically reason- 
able modification in the way that the dynamics is handled. 


Mathematical Derivation 


Now the derivation of the above results is outlined. The 
method, based on the theory of first-passage times, has 
already been applied by Bryngelson and Wolynes (6) in a 
much more ambitious treatment of protein folding. Ref. 7 
gives a useful review of the theory of first-passage times in the 
context of chemical kinetics. Here, emphasis is put on the 
mathematical formulation of the problem and not on details 
of its solution. 

The number of incorrect bonds is S$; the number of correct 
bonds is N — S. The rate at which $ —> S + 11s the number 
of correct bonds times the rate ky of changing a correct bond 
into an incorrect one, 


rate(S > S + 1) = (N — S)ko. [7] 


Similarly, the rate at which S — S — 1 is the number of 
incorrect bonds times the rate k, of changing an incorrect 
bond into a correct one, 


rate(S — S — 1) = Sky. [8] 


The probability that there are S incorrect bonds at time t is 
denoted by P(S,t). This changes by gains from S — 1 and S 
+ 1 and losses to S$ — 1 and S + 1. The gain—loss or master 
equation is 


d PS 
— ,t)= 
dt 


(N — S$ + l)koP(S — 1, t) + (S + Ik, P(S +1, t) 
—(N — S)koP(S, t) — Sk, P(S, t). [9] 


The end points $ = 0 and S = N are handled by requiring that 
P(-1, t) and P(N + 1, t) are both equal to 0. 

The standard procedure for using a master equation to find 
mean first-passage times is as follows. Write the differential 
equations for P in matrix form as 
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d 
— P(S, t)= > WS, S')P(S', t). [10] 
dt S' 


Impose an absorbing boundary condition at $ = 0, so that 
only the states § = 1 to N are involved. Then the fundamental 
equation that determines the mean first passage times is 


> T(S W(So, S)=—1, alls, [11] 


or, more explicitly, 
Sk,[7(S — 1) — 7(S)] + (N — Sklr (S + 1) — 7(S)] = -1, 
[12] 


for all S between 1 and N. It is obvious that 7(0) must vanish 
and r(N + 1) is never needed. This determines all the other 
T(S). 

It is not hard to solve these equations. The procedure is 
analogous to what one does in finding mean first-passage 
times from the Smoluchowski equation. One first solves for 
the differences AU(S) = 7(S + 1) — 7(S), with AU(O) = 7(1) 
and k,AU(N — 1) = 1, and then sums the AU(S) to get r(S). 
The solution, easily verified by substitution, is 


1 S-l1 /n—1\-1 N 
7(S)= — > (~ ‘ > (x) K™-"_ {13] 


n m=nt1 \™M 


In particular, 
(1) : [1 +K)" -1] [14] 
=— [(1+ — 1). 
ee Niko 
By using the integral identity 


N 
N 
gror 
nŠ, (x) 


= K(n+ 1) fe i f dx(1 — x)"(1 + Kx) ^"? [15] 
0 
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and then changing to the new variable y = (1 — x)/(1 + Kx), 
the double sum defining 7(S) may be reduced to a single 
integral, 

S 


T(S) = (1/ko (1 + K)K f i dy ; (1+ Ky)", [16] 


0 =y 


For large N, the integral is dominated by the contribution 
from small y. It is very weakly dependent on S. Its asymptotic 
form for large N is given by 


T(S)—> 
(1/Nko)(1 + KML + 1"NK)7!+2"NK)?2+...]. (17) 


The S-dependent parts of 7 are generally negligible in com- 
parison with the leading term (1 + K)%. This is the result 
stated in Eq. 1. 

This asymptotic approximation is not valid if ko is too 
small. In the limit kọ — 0, the integral in Eq. 16 can be 
evaluated easily and leads to Eq. 2. 
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